Goto

Collaborating Authors

 bayesian decision theory


Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

arXiv.org Machine Learning

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian Decision Theory (BDT) offers a universal principle to guide decision-making. In this work, we derive BDT for (Bayesian) active learning in the myopic framework, where we imagine we only have one more point to label. This derivation leads to effective algorithms such as Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and other algorithms that appear in the literature. Furthermore, we show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and asymptotic approximations. A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of the decision process, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on Neural Embeddings. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.


On-the-Job Learning with Bayesian Decision Theory

Neural Information Processing Systems

Our goal is to deploy a high-accuracy system starting with zero training examples. We consider an on-the-job setting, where as inputs arrive, we use real-time crowd-sourcing to resolve uncertainty where needed and output our prediction when confident. As the model improves over time, the reliance on crowdsourcing queries decreases. We cast our setting as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. Computing the optimal policy is intractable, so we develop an approximation based on Monte Carlo Tree Search. We tested our approach on three datasets--named-entity recognition, sentiment classification, and image classification. On the NER task we obtained more than an order of magnitude reduction in cost compared to full human annotation, while boosting performance relative to the expert provided labels.


On-the-Job Learning with Bayesian Decision Theory

Neural Information Processing Systems

Our goal is to deploy a high-accuracy system starting with zero training examples. We consider an "on-the-job" setting, where as inputs arrive, we use real-time crowdsourcing to resolve uncertainty where needed and output our prediction when confident. As the model improves over time, the reliance on crowdsourcing queries decreases. We cast our setting as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. Computing the optimal policy is intractable, so we develop an approximation based on Monte Carlo Tree Search. We tested our approach on three datasets-- named-entity recognition, sentiment classification, and image classification. On the NER task we obtained more than an order of magnitude reduction in cost compared to full human annotation, while boosting performance relative to the expert provided labels. We also achieve a 8% F1 improvement over having a single human label the whole set, and a 28% F1 improvement over online learning.


Making Reliable and Flexible Decisions in Long-tailed Classification

arXiv.org Machine Learning

Long-tailed classification is challenging due to its heavy imbalance in class probabilities. While existing methods often focus on overall accuracy or accuracy for tail classes, they overlook a critical aspect: certain types of errors can carry greater risks than others in real-world long-tailed problems. For example, misclassifying patients (a tail class) as healthy individuals (a head class) entails far more serious consequences than the reverse scenario. To address this critical issue, we introduce Making Reliable and Flexible Decisions in Long-tailed Classification (RF-DLC), a novel framework aimed at reliable predictions in long-tailed problems. Leveraging Bayesian Decision Theory, we introduce an integrated gain to seamlessly combine long-tailed data distributions and the decision-making procedure. We further propose an efficient variational optimization strategy for the decision risk objective. Our method adapts readily to diverse utility matrices, which can be designed for specific tasks, ensuring its flexibility for different problem settings. In empirical evaluation, we design a new metric, False Head Rate, to quantify tail-sensitivity risk, along with comprehensive experiments on multiple real-world tasks, including large-scale image classification and uncertainty quantification, to demonstrate the reliability and flexibility of our method.


Long-tailed Classification from a Bayesian-decision-theory Perspective

arXiv.org Artificial Intelligence

Long-tailed classification poses a challenge due to its heavy imbalance in class probabilities and tail-sensitivity risks with asymmetric misprediction costs. Recent attempts have used re-balancing loss and ensemble methods, but they are largely heuristic and depend heavily on empirical results, lacking theoretical explanation. Furthermore, existing methods overlook the decision loss, which characterizes different costs associated with tailed classes. This paper presents a general and principled framework from a Bayesian-decision-theory perspective, which unifies existing techniques including re-balancing and ensemble methods, and provides theoretical justifications for their effectiveness. From this perspective, we derive a novel objective based on the integrated risk and a Bayesian deep-ensemble approach to improve the accuracy of all classes, especially the "tail". Besides, our framework allows for task-adaptive decision loss which provides provably optimal decisions in varying task scenarios, along with the capability to quantify uncertainty. Finally, We conduct comprehensive experiments, including standard classification, tail-sensitive classification with a new False Head Rate metric, calibration, and ablation studies. Our framework significantly improves the current SOTA even on large-scale real-world datasets like ImageNet.


Active inference, Bayesian optimal design, and expected utility

arXiv.org Machine Learning

Active inference, a corollary of the free energy principle, is a formal way of describing the behavior of certain kinds of random dynamical systems that have the appearance of sentience. In this chapter, we describe how active inference combines Bayesian decision theory and optimal Bayesian design principles under a single imperative to minimize expected free energy. It is this aspect of active inference that allows for the natural emergence of information-seeking behavior. When removing prior outcomes preferences from expected free energy, active inference reduces to optimal Bayesian design, i.e., information gain maximization. Conversely, active inference reduces to Bayesian decision theory in the absence of ambiguity and relative risk, i.e., expected utility maximization. Using these limiting cases, we illustrate how behaviors differ when agents select actions that optimize expected utility, expected information gain, and expected free energy. Our T-maze simulations show optimizing expected free energy produces goal-directed information-seeking behavior while optimizing expected utility induces purely exploitive behavior and maximizing information gain engenders intrinsically motivated behavior.


On-the-Job Learning with Bayesian Decision Theory

Neural Information Processing Systems

Our goal is to deploy a high-accuracy system starting with zero training examples. We consider an "on-the-job" setting, where as inputs arrive, we use real-time crowdsourcing to resolve uncertainty where needed and output our prediction when confident. As the model improves over time, the reliance on crowdsourcing queries decreases. We cast our setting as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. Computing the optimal policy is intractable, so we develop an approximation based on Monte Carlo Tree Search.


Loss-Calibrated Approximate Inference in Bayesian Neural Networks

arXiv.org Machine Learning

Current approaches in approximate inference for Bayesian neural networks minimise the Kullback-Leibler divergence to approximate the true posterior over the weights. However, this approximation is without knowledge of the final application, and therefore cannot guarantee optimal predictions for a given task. To make more suitable task-specific approximations, we introduce a new loss-calibrated evidence lower bound for Bayesian neural networks in the context of supervised learning, informed by Bayesian decision theory. By introducing a lower bound that depends on a utility function, we ensure that our approximation achieves higher utility than traditional methods for applications that have asymmetric utility functions. Furthermore, in using dropout inference, we highlight that our new objective is identical to that of standard dropout neural networks, with an additional utility-dependent penalty term. We demonstrate our new loss-calibrated model with an illustrative medical example and a restricted model capacity experiment, and highlight failure modes of the comparable weighted cross entropy approach. Lastly, we demonstrate the scalability of our method to real world applications with per-pixel semantic segmentation on an autonomous driving data set.


Basics of Bayesian Decision Theory

@machinelearnbot

The use of formal statistical methods to analyse quantitative data in data science has increased considerably over the last few years. One such approach, Bayesian Decision Theory (BDT), also known as Bayesian Hypothesis Testing and Bayesian inference, is a fundamental statistical approach that quantifies the tradeoffs between various decisions using distributions and costs that accompany such decisions. In pattern recognition it is used for designing classifiers making the assumption that the problem is posed in probabilistic terms, and that all of the relevant probability values are known. Generally, we don't have such perfect information but it is a good place to start when studying machine learning, statistical inference, and detection theory in signal processing. BDT also has many applications in science, engineering, and medicine.


Bayesian Decision Theory Made Ridiculously Simple · Statistics @ Home

@machinelearnbot

So how do we determine the "best" decision? This requires that we first define some notion of what we want (what are we trying to do?). The formal object that we use to do this goes by many names depending on the field: I will refer to it as a Loss function (\(\mathcal{L}\)) but the same general concept may be alternatively called a cost function, a utility function, an acquisition function, or any number of different things. The crucial idea is that this is a function that allows us to quantify how bad/good a given decision (\(a\)) is given some information (\(\theta\)). What does it mean to quantify?